[pull] master from DataDog:master#543
Merged
Merged
Conversation
* [Backport] [7.80.x] Release integrations * Apply changelog wording fixes from review Fix trailing periods, capitalization, and minor wording per review: - sqlserver: capitalize 'Updates', use 'Datadog' instead of 'we', add periods - postgres: add trailing periods to 4 entries - mysql: add trailing periods to 2 entries - guarddog: add trailing period - kafka_actions: 'via' -> 'through', 'e.g.' -> 'such as' - cisco_aci: 'only to send' -> 'only sent', add period
Co-authored-by: dkirov-dd <166512750+dkirov-dd@users.noreply.github.com>
* Fix n8n metric mappings and add full v2 metric coverage - Drop fabricated metric names that n8n never emitted; map only what is empirically present. - Add the n8n 2.x metric families: workflow.execution.duration histogram, audit.workflow.*, embed.login.*, token.exchange.*, process.pss.bytes, runner.task.requested, and the workflow_statistics gauges. - Add worker-only families (node.started, node.finished, queue.job.dequeued, runner.task.requested) by introducing a worker-scrape instance. - Stop gating the OpenMetrics scrape on /healthz/readiness; emit n8n.readiness.check unconditionally so metrics still flow when the readiness endpoint is unhealthy. - Replace the custom Dockerfile with a direct n8nio/n8n image reference and parameterise the version via hatch.toml so the test matrix can run against both 1.118.1 and 2.19.5. - Allocate free host ports via datadog_checks.dev.utils.find_free_ports and forward them through docker_run env_vars to avoid port collisions on re-runs. * Add changelog for PR #23635 * Refine n8n metric coverage and e2e setup * Document raw_metric_prefix requirement when customizing N8N_METRICS_PREFIX * Reformat changelog so towncrier renders sub-bullets correctly * Add tests/lab traffic generator for n8n A long-running n8n simulation that layers on top of the integration test environment so a real Datadog Agent can ship metrics to a Datadog org for dashboard / monitor iteration. - tests/lab/workflows/: five lab-only workflow JSONs covering distinct shapes (fast, slow Wait node, always-fail Code, flaky 30%, four-step chain). - tests/lab/traffic_generator.py: click CLI (start/generate/stop) that runs ddev env start --base, copies + imports + activates the lab workflows, restarts n8n, and drives a configurable async traffic mix against the webhooks and REST API. - tests/lab/config.yaml: webhook + REST probabilities and tick / reload intervals; hot-reloaded while the generator runs. - tests/lab/.ddev.toml: pins the lab to an `n8nlab` ddev org. - tests/lab/run_lab.sh: bash entrypoint with an EXIT trap so Ctrl+C always runs lab:stop. - hatch.toml: new [envs.lab] env with click/httpx/pyyaml/rich and start/generate/stop scripts. * Add missing n8n event metric mappings * Add VM-isolated expression engine metrics (n8n 2.x) n8n 2.x ships a new VM-isolated expression engine in @n8n/expression-runtime that registers its own Prometheus metrics under the n8n_expression_* prefix. The metrics are gated on N8N_EXPRESSION_ENGINE=vm and N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED=true, neither of which defaults to on, so live containers do not emit samples unless explicitly opted in. Map the family in datadog_checks/n8n/metrics.py, add metadata.csv rows with the version + flag requirements documented, add synthetic samples to the unit fixtures so check_symmetric_inclusion stays green, list the metrics in the V2_ONLY / RARE_EVENT sets used by the integration assertions, and call out the env vars in the README's version-specific block. * Split lab into its own compose, mount workflows by bind The lab previously shared tests/docker/docker-compose.yaml with the test env and drove workflow import through docker exec in traffic_generator.py. That coupled two consumers with different port expectations (the test env uses find_free_ports for parallel safety; the lab needs a fixed URL for docs, agent config, and the traffic generator) and put workflow lifecycle in two places. Add tests/lab/docker-compose.yaml that hardcodes 5678/5680 and bind-mounts both the test fixtures and the lab workflows under /workflows/. Gate the compose + port selection in tests/common.py on N8N_IS_LAB so the same conftest serves both modes. Move workflow import/activate into conftest (scanning the bind-mount, reading stable ids from JSON), and add a lab-only logs block + docker_volumes yield so the Datadog Agent picks up n8n stdout via autodiscovery and the event-bus log files via the data volume. Drop the docker-exec workflow import from traffic_generator.py now that conftest owns it. Update the README log-collection section to reflect that event-bus logs live under the n8n user folder rather than N8N_LOG_FILE_LOCATION. * Address PR review feedback - Tighten _generate_workflow_traffic success check to == 200 so a webhook that responds 4xx (e.g. not yet registered after restart) does not falsely count as a healthy workflow run; capture last_status / last_exc and surface them in the RuntimeError so CI failures point at the real cause. - Replace the bespoke time.monotonic() wait loops with WaitFor + raise predicates (the dominant pattern across integrations-core). Restructure dd_environment conditions so the docker_run condition chain runs: wait for /healthz, activate workflows, then assert /metrics reachable on main + worker. Workflow-started wait stays inline since _generate_workflow_traffic is not idempotent. - Drop drop_rare_event_metrics; pass the public exclude= parameter to assert_metrics_using_metadata so we don't reach into AggregatorStub internals. - Replace bare try/except RequestException: pass blocks with contextlib.suppress. - Parametrize the two unit-fixture metadata tests; add the missing pytestmark = pytest.mark.unit and a comment explaining why the unit assertion is version-pinned to major=2. - Re-word the lab traffic generator reload-failure messages so it's clear the lab keeps running with the previous config. - Add N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION to the README's version-specific block and to the changelog flag list; indent the changelog sub-bullets so towncrier nests them under the wrapping bullet. * Fix e2e test referencing removed drop_rare_event_metrics helper Use the public exclude= parameter on assert_metrics_using_metadata, matching test_integration.py. test_e2e.py was missed in the earlier review-feedback commit. * Proofread n8n README against the Datadog style guide - Remove stray scratch notes accidentally committed at the end of the file (numbered questions and a changelog-process note that didn't belong in the public README). - Sentence-case the 'Data collected' and 'Service checks' headings. - Replace hyphen-as-em-dash usage (' - ') by splitting into separate sentences. - Replace slash-as-and/or in lists and tag descriptions: 'enqueued/dequeued/completed/failed/stalled counters' -> spelled-out list; 'result:success/failure' -> 'result:success or result:failure'; 'stdout/stderr' -> 'stdout and stderr'. * Move workflow setup back into docker_run conditions to fix e2e In the previous refactor _generate_workflow_traffic and the _workflow_started_non_zero wait were moved into the body of the dd_environment context manager. That made them vulnerable to fixture re-invocation paths (e.g. session teardown or flaky-plugin retry) that fired the body code against torn-down containers, producing a setup error after the e2e test had already passed. Put both back into conditions=[...]. That keeps them inside docker_run's set_up() retry envelope (attempts=2 in CI), and they are no longer exposed to the post-yield teardown path. The post-restart /healthz wait moves back inside _activate_imported_workflows so the function stays self-contained as a condition. Restore the (instances, E2E_METADATA) tuple yield for non-lab mode so the e2e Agent container still gets the docker_volumes mount it expects. * Address second-round PR review feedback - conftest.py: parse n8n_workflow_started_total samples as floats instead of string-matching ' 0', so '0.0' / '0e+0' counter values are not treated as non-zero and OpenMetrics '# HELP'/'# TYPE' comment lines that share the prefix are skipped. - common.py: collapse the get_all_metadata_metrics passthrough into get_metadata_metrics_for_version (update integration + e2e call sites) and document the intentional V2_ONLY / RARE_EVENT overlap so future contributors do not assume the duplication is accidental. - check.py: cache the readiness endpoint with functools.cached_property (it is derived from immutable config) and parameterise the dict return / argument types as dict[str, Any]. - traffic_generator.py: scope the asyncio.Event and current config to _run_traffic instead of holding them at module level, threading both through _config_reloader. Switches the SIGINT/SIGTERM hook to loop.add_signal_handler so a second 'generate' invocation in the same process starts from a clean state. * Address third-round PR review feedback - conftest.py: move the worker CheckEndpoints to after _activate_imported_workflows so any cascade from the n8n main restart is caught before downstream conditions scrape the worker. - test_unit.py: import the requests module and reference requests.ConnectionError at the call site so the builtin ConnectionError name is not shadowed for the rest of the module. - traffic_generator.py: extract _make_output_table() so the table schema lives in one place and _print_row() only owns row data. * Wait for webhook registration after n8n restart on v2 On n8n 2.x, /healthz comes back after `docker compose restart n8n` before n8n has finished re-registering the active workflows' webhook routes. The existing WaitFor(_n8n_healthy) inside _activate_imported_workflows was satisfied while /webhook/test still returned 404, so _generate_workflow_traffic raced the registration and failed with last_status=404. Add a second WaitFor poll on the integration-test webhook itself so the registration is observed before downstream conditions run. v1 happens to register fast enough that the gap is not observable there, but the extra check costs at most one poll on the happy path. * Map n8n event-bus dynamic counters Map the broader n8n event-bus surface (~45 dynamic counters) covering audit (user, credentials, package, variable, execution data), AI node, runner, and workflow cancellation events, plus execution throttling. Counter names rejected by n8n's own prom-client validation (hyphenated families such as external-secrets, token-exchange, role-mapping, and cluster) are intentionally not mapped and called out in metrics.py. The integration test environment cannot realistically exercise these families end to end, so each new metric is documented as best-effort in metadata.csv and added to RARE_EVENT_METRIC_NAMES. The unit fixture carries synthetic samples so the metric map stays validated. README covers the dynamic-counter scope and shows an extra_metrics example for users to add events from future n8n releases. * Drop technical hyphen-rejection paragraph from n8n README * Tighten n8n changelog to one-line themes * Tone down n8n changelog lead-in * Reframe n8n changelog from user perspective * Treat any 2xx response as ready, bump n8n to a major release The readiness gauge now reports 1 for any HTTP 2xx response on /healthz/readiness, not only 200. Rename the changelog entry from .added to .changed so the next release is a major bump, reflecting the breadth of the integration overhaul.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )